pacman::p_load(arrow,lubridate,tidyverse,tmap,sf,st,spatstat,patchwork,raster,spNetwork,gifski)1.0 Introduction
Human mobility, the spatial-temporal dynamics of human movements, serves as a critical reflection of societal patterns and human behaviors. With the advancement and pervasiveness of Information and Communication Technologies (ICT) in our daily life, especially smart phone, a large volume of data related to human mobility have been collected. These data provide valuable insights into understanding how individuals and populations move within and between different geographical locations. By using appropriate GIS analysis methods, these data can turn into valuable inisghts for predicting future mobility trrends and developing more efficient and sustainable strategies for managing urban mobility.
In this study, I will apply Spatial Point Patterns Analysis methods to discover the geographical and spatio-temporal distribution of Grab hailing services locations in Singapore. In order to determine the geographical and spatio-temporal patterns of the Grab hailing services, I will develop traditional Kernel Density Estimation (KDE) and Temporal Network Kernel Density Estimation (TNKDE). KDE layers will help identify the areas with high concentration of Grab hailing services, providing insights into the demand and popularity of these services in different parts of Singapore. TNKDE, on the other hand, will allow for analysis of how the distribution of Grab hailing services changes over time, revealing temporal patterns and trends in their usage. These spatial and spatio-temporal analyses will contribute to a better understanding of the dynamics and effectiveness of Grab’s mobility services in Singapore.
2.0 Literature Review of Spatial Point Pattern Analysis
Spatial point pattern analysis is concerned with description, statistical characterization, modeling and visulisation of point patterns over space and making inference about the process that could have generated an observed pattern (Boots & Getis, 1988 ,Rey et al., 2023; Pebesma & Bivand, 2023). According to this theory, empirical spatial distribution of points in our daily life are not controlled by sampling, but a result of an underlying geographically-continuous process (Rey et al., 2023). For example, an COVID-19 cluster did not happen by chance, but due to a spatial process of close-contact infection.
When analysing real-world spatial points, it is important to analyse whether the observed spatial points are randomly distributed or patterned due to a process or interaction (Floch et al., 2018). In “complete random” distribution, points are located everywhere with the same probability and independently of each other. On the other hand, the spatial points can be clustered or dispersed due to an underlying point process. However, it is challenging to use heuristic observation and intuitive interpretation to detect whether a spatial point pattern exists (Baddeley et al., 2015; Floch et al., 2018). Hence, spatial point pattern analysis can be used to detect the spatial concentration or dispersion phenomena.

When analysing and interpreting the properties of a point pattern, these properties can be categorized into two: (a) first-order properties and (b) second-order properties (Yuan et al., 2020; Gimond, 2023). First-order properties concern with the characteristics of individual point locations and their variations of their density across space (Gimond, 2023). Under this conception, observations vary from point to point due to changes in the underlying property. Second-order properties focus on not only individual points, but also the interactions between points and their influences on one another (Gimond, 2023). Under this conception, observations vary from place to place due to interaction effects between observations. First-order properties of point patterns are mostly addressed by density-based techniques, such as quadrat analysis and kernel density estimation, whereas, distance-based techniques, such nearest neighbour index and K-functions, are often used to analyse second-order properties since they take into account the distance between point pairs (Yuan et al., 2020; Gimond, 2023).
3.0 Importing Packages
Before we start the exercise, we will need to import necessary R packages first. We will use the following packages:
arrow: for reading and writing Apache Parquet fileslubridate: for tackling with temporal data (dates and times)spatstat: A package for statistical analysis of spatial data, specifically Spatial Point Pattern Analysis. This package was provided by Baddeley, Turner and Ruback (2015) and gives a comprehensive list of functions to perform 1st- and 2nd-order spatial point patterns analysis and derive kernel density estimation (KDE) layer.rgdal: Used to import geospatial data and output as spatial class objects from sp package
raster : reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster). In this hands-on exercise, it will be used to convert image output generate by spatstat into raster format.
maptools, ggplot2, ggthemes, plotly: Packages used to plot interactive visualisations summary statistics and KDE layers
4.0 Importing Datasets into R Environment
4.1 Datasets
In this study, we will use Grab-Posisi dataset, which is a comprehensive GPS trajectory dataset for car-hailing services in Southeast Asia. Apart from the time and location of the object, GPS trajectories are also characterised by other parameters such as speed, the headed direction, the area and distance covered during its travel, and the travelled time. Thus, the trajectory patterns from users GPS data are a valuable source of information for a wide range of urban applications, such as solving transportation problems, traffic prediction, and developing reasonable urban planning.
Moreover, we will also use OpenStreetMap dataset, which is an open-sourced geospatial dataset including shapefiles of important layers including road networks, forests, building footprints and many other points of interest.
To extract the Singapore boundary, we will use Master Plan 2019 Subzone Boundary (No Sea), provided by data.gov.sg.
4.2 Importing Grab-Posisi Dataset
Each trajectory in Grab-Posisi dataset is serialised in a file in Apache Parquet format. The Singapore portion of the dataset is packaged into a total of 10 Parquet files.
Firstly, we will use read_parquet function from arrow package, which allows us to read Parquet files into R environment as a data frame (more specifically, a tibble).
df <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00000.snappy.parquet',as_data_frame = TRUE)
df1 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00001.snappy.parquet',as_data_frame = TRUE)
df2 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00002.snappy.parquet',as_data_frame = TRUE)
df3 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00003.snappy.parquet',as_data_frame = TRUE)
df4 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00004.snappy.parquet',as_data_frame = TRUE)
df5 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00005.snappy.parquet',as_data_frame = TRUE)
df6 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00006.snappy.parquet',as_data_frame = TRUE)
df7 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00007.snappy.parquet',as_data_frame = TRUE)
df8 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00008.snappy.parquet',as_data_frame = TRUE)
df9 <- read_parquet('~/IS415-GAA/data/GrabPosisi/part-00009.snappy.parquet',as_data_frame = TRUE)To consolidate all trajectory instances into a single dataframe, we’ll vertically bind all 10 imported dataframes using bind_rows() function from tidyverse package.
df_trajectories <- bind_rows(df,df1,df2,df3,df4,df5,df6,df7,df8,df9)To get a quick overview of the dataset, we’ll first check the number of trajectory instances using dim() function. Then, we’ll use head() function to quickly scan through the data columns and values
dim(df_trajectories)[1] 30329685 9
head(df_trajectories)# A tibble: 6 × 9
trj_id driving_mode osname pingtimestamp rawlat rawlng speed bearing accuracy
<chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <int> <dbl>
1 70014 car android 1554943236 1.34 104. 18.9 248 3.9
2 73573 car android 1555582623 1.32 104. 17.7 44 4
3 75567 car android 1555141026 1.33 104. 14.0 34 3.9
4 1410 car android 1555731693 1.26 104. 13.0 181 4
5 4354 car android 1555584497 1.28 104. 14.8 93 3.9
6 32630 car android 1555395258 1.30 104. 23.2 73 3.9
From the result above, we can see that the dataset includes a total of 30329685 trajectory instances, each with a total of 9 columns as follows:
| Column Name | Data Type | Remark |
|---|---|---|
| trj_id | chr | Trajectory ID |
| driving_mode | chr | Mode of Driving |
| osname | chr | |
| pingtimestamp | int | Data Recording Timestamp |
| rawlat | num | Latitude Value (WGS-84) |
| rawlng | num | Longitude Value (WGS-84) |
| speed | num | Speed |
| bearing | int | Bearing |
| accuracy | num | Accuracy |
From the above table, it is seen that the pingtimestamp is recorded as int. We need to convert this data to proper datetime format to derive meaningful temporal insights of the data. To do so, we will use as_datetime() function from lubridate package.
df_trajectories$pingtimestamp <- as_datetime(df_trajectories$pingtimestamp)4.3 Importing OpenStreetMap road data for Malaysia, Singapore and Brunei
The gis_osm_roads_free_1 dataset, which we downloaded from OpenStreetMap, is in ESRI shapefile format. To use this data in an R-environment, we need to import it as an sf object. We can do this using the st_read() function of the sf package. This function reads the shapefile data and returns an sf object that can be used for further analysis.
osm_road_sf <- st_read(dsn = "~/IS415-GAA/data/geospatial",
layer = "gis_osm_roads_free_1") %>% st_transform(crs = 3414)4.4 Importing Singapore Master Plan Planning Subzone boundary data
The MP14_SUBZONE_WEB_PL dataset, which we downloaded from data.gov.sg, is in ESRI shapefile format. To use this data in an R-environment, we need to import it as an sf object. We can do this using the st_read() function of the sf package. This function reads the shapefile data and returns an sf object that can be used for further analysis.
mpsz_sf <- st_read(dsn = "~/IS415-GAA/data/geospatial",
layer = "MPSZ-2019") %>% st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`/Users/khantminnaing/IS415-GAA/data/geospatial' using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
In the code chunk above, we use %>% operator is used to pipe the output of st_read() to the st_transform() function. Since the dataset we are using is the Singapore boundary, we need to assign the standard coordinate reference system for Singapore, which is SVY21 (EPSG:3414). st_transform() function transforms the coordinate reference system of the sf object to 3414.
After importing the dataset, we will plot it to see how it looks. The plot() function is used to plot the geometry of the sf object. The st_geometry() function is used to extract the geometry of the mpsz_sf object.
plot(st_geometry(mpsz_sf))
5.0 Data Wrangling
Data wrangling is the process of converting and transforming raw data into a usable form and is carried out prior to conducting any data analysis.
5.1 Extracting Trip Starting Locations and Temporal Data Values from Grab-Posisi dataset
Firstly, we will extract trip starting locations for all unqiue trajectories in the dataset and store them to a new df named origin_df. We are also interested in obtaining valuable temporal data such as the day of the week, the hour, and the date (yy-mm-dd). To do so, we will use the following functions from lubridate package, and add the newly derived values as columns to origin_df.
wday: allows us to get days component of a date-timehour: allows us to get hours component of a date-timemday: allows us to parse dates with year, month, and day components
origin_df <- df_trajectories %>%
group_by(trj_id) %>%
arrange(pingtimestamp) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
pickup_hr = factor(hour(pingtimestamp)),
day = factor(mday(pingtimestamp)))5.2 Extracting Trip Ending Locations and Temporal Data Values from Grab-Posisi dataset
Similar to what we did in previous session, we are also interested to extract trip ending locations and associated temporal data into a new df called destination_df. We will use the same functions from previous session here.
destination_df <- df_trajectories %>%
group_by(trj_id) %>%
arrange(desc(pingtimestamp)) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
dropoff_hr = factor(hour(pingtimestamp)),
day = factor(mday(pingtimestamp)))arrange() function sort the timestamps in ascending order by default. Hence, this default order is applied to origin_df. However, for destination_df, the arrange(desc()) argument is used to modify the default order to descending.
5.3 Converting to sf tibble data.frame
origin_sf <- st_as_sf(origin_df,
coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)
destination_sf <- st_as_sf(destination_df,
coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)5.4 Creating a CoastalLine of Singapore
The original mpsz_sf dataset we imported include information of all URA Master Plan planning area boundaries. However, for this analysis, we only need the national-level boundary of Singapore. Hence, we will need to union all the subzone boundaries to one single polygon boundary. Also, Grab ride-hailing service is only available on the main Singapore islands. Hence, we will need to remove outer islands which Grab service is not available. In particular, we will remove the following planning subzones: NORTH-EASTERN ISLANDS, SOUTHERN GROUP, SUDONG & SEMAKAU.
We can remove these subzones using the subset() function. The subset() function is used to extract rows from a data frame that meet certain conditions.
northeasten.islands <- subset(mpsz_sf, mpsz_sf$SUBZONE_N == "NORTH-EASTERN ISLANDS")
southern.islands <- subset(mpsz_sf, mpsz_sf$SUBZONE_N == "SOUTHERN GROUP")
sudong <- subset(mpsz_sf, mpsz_sf$SUBZONE_N == "SUDONG")
semakau <- subset(mpsz_sf,mpsz_sf$SUBZONE_N == "SEMAKAU")
outerislands <- dplyr::bind_rows(list(northeasten.islands,southern.islands,sudong,semakau))In the code chunk above, we first created four new data frames called northeasten.islands, southern.islands, sudong, and semakau by selecting rows from mpsz_sf where the value in the SUBZONE_N column matches the corresponding value.
After that, we used bind_rows() function from the dplyr package to combine these four data frames into a single data frame called outerislands.
After importing the dataset, we will plot it to see how it looks.
plot(st_geometry(outerislands))
As mentioned earlier, we only need to get national-level boundary of Singapore, without outer islands. To do so, we will need to process the mpsz_sf layer to achieve the outcome. - We will first use st_union() function from the sf package to combine the geometries of mpsz_sf and outerislands sf objects into a single geometry each. - Next, we will use st_difference() function then removes the overlapping areas between the two geometries. - Finally, we will store the non-overlapping areas into a new sf objected called sg_sf.
sg_sf <- st_difference(st_union(mpsz_sf), st_union(outerislands))To assess whether the geometry of the newly created sg_sf matches our intended outcome, we will plot it out.
plot(st_geometry(sg_sf))
5.5 Extracting Road Layers within Singapore
As we have seen in Section 4.3., osm_road_sf dataset includes road networks from not only Singapore, but also Malaysia and Brunei. However, our analysis is focused on Singapore. Hence, we will need to remove unecessary data rows. To do so, we will
sg_road_sf <- st_intersection(osm_road_sf,lsg_sf)Next, we will look at the classification of road networks as provided by OpenStreetMap.
unique(sg_road_sf$fclass) [1] "primary" "residential" "tertiary" "footway"
[5] "service" "secondary" "motorway" "motorway_link"
[9] "trunk" "trunk_link" "primary_link" "pedestrian"
[13] "living_street" "unclassified" "steps" "track_grade2"
[17] "track" "secondary_link" "cycleway" "path"
[21] "tertiary_link" "track_grade1" "track_grade3" "unknown"
[25] "track_grade5" "bridleway" "track_grade4"
Looking at the road classification, it is observed that not all categories are relevant to our analysis, which is primarily concerned with driving networks where taxis can facilitate pick-ups or drop-offs. Hence, we will implement a filtering process on the dataset to exclude road segments that fall outside the scope of our analysis.
Firstly we will specify the road classes that we want to retain.
driving_classes <- c("primary", "primary_link", "residential", "secondary", "secondary_link", "service", "tertiary", "tertiary_link") Next, we will filter sg_road_sf object to remove all the rows that does not have our desired f_class attribute value.
sg_driving_sf <- sg_road_sf %>%
filter(fclass %in% driving_classes)
unique(sg_driving_sf$fclass)[1] "primary" "residential" "tertiary" "service"
[5] "secondary" "primary_link" "secondary_link" "tertiary_link"
Why motorway and motorway_link classes are not included?
According to OpenStreetMap, fclass motorway refers to expressway. In Singapore, stopping or parking a vehicle on an expressway is illegal under the Road Traffic Act. Hence, motorway (and motorway_link) are not relevant for network constraint kernel density estimation (NKDE) analysis that we will carry out later.
Now that we have filterd out the dataset, we will now plot to see the driving road network of Singapore using tmap.
tmap_mode("plot")
tm_shape(sg_sf) +
tm_polygons() +
tm_shape(sg_driving_sf) +
tm_lines(col="fclass", palette ="viridis") +
tm_layout(main.title = "Road Network in Singapore",
main.title.position = "center",
main.title.size = 1.2,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)
5.5 Converting the Simple Features to Planar Point Pattern Object
In order to use the capabilities of spatstat packahe, a spatial dataset should be converted into an object of class planar point pattern ppp (Baddeley et al., 2015). A point pattern object contains the spatial coordinates of the points, the marks attached to the points (if any), the window in which the points were observed, and the name of the unit of length for the spatial coordinates. s. Thus, a single object of class ppp contains all the information required to perform spatial point pattern analysis.
In previous section, we have created sf objects of Grab trajectory origin and destination points. Now, we will convert them into ppp objects using as.ppp() function from spatstat package.
origin_ppp <- as.ppp(st_coordinates(origin_sf), st_bbox(origin_sf))
par(mar = c(0,0,1,0))
plot(origin_ppp)
The code chunk above converts the origin_sf object to a point pattern object of class ppp. st_coordinates() function is used to extract the coordinates of the origin_sf object and st_bbox() function is used to extract the bounding box of the origin_sf object. The resulting object origin_ppp is a point pattern object of class ppp.
destination_ppp <- as.ppp(st_coordinates(destination_sf), st_bbox(destination_sf))
par(mar = c(0,0,1,0))
plot(destination_ppp)
5.6 Handling Data Errors
Before going striaght into analysis, we will need to a quick look at the summary statistics of the newly created ppp objects. This is an important step to ensure that the data is free of errors and that a reliable analysis can be performed.
5.6.1 Data Error Handling for origin_ppp
We will use summary() function to get summary information of origin_ppp object.
summary(origin_ppp)Planar point pattern: 28000 points
Average intensity 2.473666e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
(46220 x 24490 units)
Window area = 1131920000 square units
We can also check if there is any duplicated points in origin_ppp object using any(duplicated() function.
any(duplicated(origin_ppp))[1] FALSE
The code output is FALSE, which means there are no duplication of point coordaintes in the origin_ppp object.
Why do we need to check duplication?
When analyzing spatial point processes, it is important to avoid duplication of points. This is because statistical methodology for spatial point processes is based largely on the assumption that processes are simple, i.e., that points of the process can never be coincident. When the data have coincident points, some statistical procedures designed for simple point processes will be severely affected (Baddeley et al., 2015).
5.6.2 Data Error Handling for destination_ppp
We will use summary() function to get summary information of destination_ppp object.
summary(destination_ppp)Planar point pattern: 28000 points
Average intensity 2.493661e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3637.21, 49870.63] x [25221.3, 49507.79] units
(46230 x 24290 units)
Window area = 1122850000 square units
We can also check if there is any duplicated points in destination_ppp object using any(duplicated() function.
any(duplicated(destination_ppp))[1] FALSE
The code output is FALSE, which means there are no duplication of point coordinates in the destination_ppp object.
5.7 Creating Observation Windows
Many data types in spatstat require us to specify the region of space inside which the data were observed. This is the observation window and it is represented by an object of class owin. In this analysis, our study area is Singapore, hence we will use Singapore boundary as the observation window for spatial point pattern analysis.
In Section 5.4, we have already created the sg_sf object, which represents the Singapore boundary (without outer islands). To convert this sf object to owin object, we will use as.owin() function from spatstat package.
sg_owin <- as.owin(sg_sf)
plot.owin(sg_owin)
We will use summary() function to get summary information of sg_owin object.
summary(sg_owin)Window: polygonal boundary
56 separate polygons (36 holes)
vertices area relative.area
polygon 1 15307 7.00834e+08 9.92e-01
polygon 2 285 1.61128e+06 2.28e-03
polygon 3 27 1.50315e+04 2.13e-05
polygon 4 (hole) 41 -4.01660e+04 -5.69e-05
polygon 5 (hole) 317 -5.11280e+04 -7.24e-05
polygon 6 (hole) 3 -4.14099e-04 -5.86e-13
polygon 7 30 2.80002e+04 3.97e-05
polygon 8 (hole) 4 -2.86396e-01 -4.06e-10
polygon 9 (hole) 3 -1.81439e-04 -2.57e-13
polygon 10 (hole) 3 -8.68789e-04 -1.23e-12
polygon 11 (hole) 3 -5.99535e-04 -8.49e-13
polygon 12 (hole) 3 -3.04561e-04 -4.31e-13
polygon 13 (hole) 3 -4.46076e-04 -6.32e-13
polygon 14 (hole) 3 -3.39794e-04 -4.81e-13
polygon 15 (hole) 3 -4.52043e-05 -6.40e-14
polygon 16 (hole) 3 -3.90173e-05 -5.53e-14
polygon 17 (hole) 3 -9.59850e-05 -1.36e-13
polygon 18 (hole) 4 -2.54488e-04 -3.60e-13
polygon 19 (hole) 4 -4.28453e-01 -6.07e-10
polygon 20 (hole) 4 -2.18616e-04 -3.10e-13
polygon 21 (hole) 5 -2.44411e-04 -3.46e-13
polygon 22 (hole) 5 -3.64686e-02 -5.16e-11
polygon 23 71 8.18750e+03 1.16e-05
polygon 24 (hole) 6 -8.37554e-01 -1.19e-09
polygon 25 (hole) 38 -7.79904e+03 -1.10e-05
polygon 26 (hole) 3 -3.41897e-05 -4.84e-14
polygon 27 (hole) 3 -3.65499e-03 -5.18e-12
polygon 28 (hole) 3 -4.95057e-02 -7.01e-11
polygon 29 91 1.49663e+04 2.12e-05
polygon 30 (hole) 3 -3.79135e-02 -5.37e-11
polygon 31 (hole) 5 -2.92235e-04 -4.14e-13
polygon 32 (hole) 3 -7.43616e-06 -1.05e-14
polygon 33 (hole) 270 -1.21455e+03 -1.72e-06
polygon 34 (hole) 19 -4.39650e+00 -6.23e-09
polygon 35 (hole) 35 -1.38385e+02 -1.96e-07
polygon 36 (hole) 23 -1.99656e+01 -2.83e-08
polygon 37 40 1.38607e+04 1.96e-05
polygon 38 (hole) 41 -6.00381e+03 -8.50e-06
polygon 39 (hole) 7 -1.40546e-01 -1.99e-10
polygon 40 (hole) 11 -8.36705e+01 -1.18e-07
polygon 41 (hole) 3 -2.33435e-03 -3.31e-12
polygon 42 45 2.51218e+03 3.56e-06
polygon 43 139 3.22293e+03 4.56e-06
polygon 44 148 3.10395e+03 4.40e-06
polygon 45 (hole) 4 -1.72650e-04 -2.44e-13
polygon 46 75 1.73526e+04 2.46e-05
polygon 47 83 5.28920e+03 7.49e-06
polygon 48 106 3.04104e+03 4.31e-06
polygon 49 264 1.50631e+06 2.13e-03
polygon 50 71 5.63061e+03 7.97e-06
polygon 51 10 1.99717e+02 2.83e-07
polygon 52 (hole) 3 -1.37223e-02 -1.94e-11
polygon 53 487 2.06117e+06 2.92e-03
polygon 54 65 8.42861e+04 1.19e-04
polygon 55 47 3.82087e+04 5.41e-05
polygon 56 22 6.74651e+03 9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
(53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46
5.8 Combining ppp objects and owin object
In section 5.5, we have created two ppp objects - origin_ppp and destination_ppp, each representing the spatial points of Grab trajectory origin and destination. In section 5.7, we have created a owin object called sg_owin, which represent the observation window of our analysis.
The observation window sg_owin and the point pattern origin_ppp or destination_ppp can be combined, so that the custom window replaces the default ractangular extent (as seen in section 5.5).
origin_ppp_sg = origin_ppp[sg_owin]
destination_ppp_sg = destination_ppp[sg_owin]
par(mar = c(0,0,1,0))
plot(origin_ppp_sg)
plot(destination_ppp_sg)
We will use summary() function to get summary information of the newly created origin_ppp_sg object and destination_ppp_sg object.
summary(origin_ppp_sg)Planar point pattern: 28000 points
Average intensity 3.965129e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: polygonal boundary
56 separate polygons (36 holes)
vertices area relative.area
polygon 1 15307 7.00834e+08 9.92e-01
polygon 2 285 1.61128e+06 2.28e-03
polygon 3 27 1.50315e+04 2.13e-05
polygon 4 (hole) 41 -4.01660e+04 -5.69e-05
polygon 5 (hole) 317 -5.11280e+04 -7.24e-05
polygon 6 (hole) 3 -4.14099e-04 -5.86e-13
polygon 7 30 2.80002e+04 3.97e-05
polygon 8 (hole) 4 -2.86396e-01 -4.06e-10
polygon 9 (hole) 3 -1.81439e-04 -2.57e-13
polygon 10 (hole) 3 -8.68789e-04 -1.23e-12
polygon 11 (hole) 3 -5.99535e-04 -8.49e-13
polygon 12 (hole) 3 -3.04561e-04 -4.31e-13
polygon 13 (hole) 3 -4.46076e-04 -6.32e-13
polygon 14 (hole) 3 -3.39794e-04 -4.81e-13
polygon 15 (hole) 3 -4.52043e-05 -6.40e-14
polygon 16 (hole) 3 -3.90173e-05 -5.53e-14
polygon 17 (hole) 3 -9.59850e-05 -1.36e-13
polygon 18 (hole) 4 -2.54488e-04 -3.60e-13
polygon 19 (hole) 4 -4.28453e-01 -6.07e-10
polygon 20 (hole) 4 -2.18616e-04 -3.10e-13
polygon 21 (hole) 5 -2.44411e-04 -3.46e-13
polygon 22 (hole) 5 -3.64686e-02 -5.16e-11
polygon 23 71 8.18750e+03 1.16e-05
polygon 24 (hole) 6 -8.37554e-01 -1.19e-09
polygon 25 (hole) 38 -7.79904e+03 -1.10e-05
polygon 26 (hole) 3 -3.41897e-05 -4.84e-14
polygon 27 (hole) 3 -3.65499e-03 -5.18e-12
polygon 28 (hole) 3 -4.95057e-02 -7.01e-11
polygon 29 91 1.49663e+04 2.12e-05
polygon 30 (hole) 3 -3.79135e-02 -5.37e-11
polygon 31 (hole) 5 -2.92235e-04 -4.14e-13
polygon 32 (hole) 3 -7.43616e-06 -1.05e-14
polygon 33 (hole) 270 -1.21455e+03 -1.72e-06
polygon 34 (hole) 19 -4.39650e+00 -6.23e-09
polygon 35 (hole) 35 -1.38385e+02 -1.96e-07
polygon 36 (hole) 23 -1.99656e+01 -2.83e-08
polygon 37 40 1.38607e+04 1.96e-05
polygon 38 (hole) 41 -6.00381e+03 -8.50e-06
polygon 39 (hole) 7 -1.40546e-01 -1.99e-10
polygon 40 (hole) 11 -8.36705e+01 -1.18e-07
polygon 41 (hole) 3 -2.33435e-03 -3.31e-12
polygon 42 45 2.51218e+03 3.56e-06
polygon 43 139 3.22293e+03 4.56e-06
polygon 44 148 3.10395e+03 4.40e-06
polygon 45 (hole) 4 -1.72650e-04 -2.44e-13
polygon 46 75 1.73526e+04 2.46e-05
polygon 47 83 5.28920e+03 7.49e-06
polygon 48 106 3.04104e+03 4.31e-06
polygon 49 264 1.50631e+06 2.13e-03
polygon 50 71 5.63061e+03 7.97e-06
polygon 51 10 1.99717e+02 2.83e-07
polygon 52 (hole) 3 -1.37223e-02 -1.94e-11
polygon 53 487 2.06117e+06 2.92e-03
polygon 54 65 8.42861e+04 1.19e-04
polygon 55 47 3.82087e+04 5.41e-05
polygon 56 22 6.74651e+03 9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
(53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46
summary(destination_ppp_sg)Planar point pattern: 27997 points
Average intensity 3.964704e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: polygonal boundary
56 separate polygons (36 holes)
vertices area relative.area
polygon 1 15307 7.00834e+08 9.92e-01
polygon 2 285 1.61128e+06 2.28e-03
polygon 3 27 1.50315e+04 2.13e-05
polygon 4 (hole) 41 -4.01660e+04 -5.69e-05
polygon 5 (hole) 317 -5.11280e+04 -7.24e-05
polygon 6 (hole) 3 -4.14099e-04 -5.86e-13
polygon 7 30 2.80002e+04 3.97e-05
polygon 8 (hole) 4 -2.86396e-01 -4.06e-10
polygon 9 (hole) 3 -1.81439e-04 -2.57e-13
polygon 10 (hole) 3 -8.68789e-04 -1.23e-12
polygon 11 (hole) 3 -5.99535e-04 -8.49e-13
polygon 12 (hole) 3 -3.04561e-04 -4.31e-13
polygon 13 (hole) 3 -4.46076e-04 -6.32e-13
polygon 14 (hole) 3 -3.39794e-04 -4.81e-13
polygon 15 (hole) 3 -4.52043e-05 -6.40e-14
polygon 16 (hole) 3 -3.90173e-05 -5.53e-14
polygon 17 (hole) 3 -9.59850e-05 -1.36e-13
polygon 18 (hole) 4 -2.54488e-04 -3.60e-13
polygon 19 (hole) 4 -4.28453e-01 -6.07e-10
polygon 20 (hole) 4 -2.18616e-04 -3.10e-13
polygon 21 (hole) 5 -2.44411e-04 -3.46e-13
polygon 22 (hole) 5 -3.64686e-02 -5.16e-11
polygon 23 71 8.18750e+03 1.16e-05
polygon 24 (hole) 6 -8.37554e-01 -1.19e-09
polygon 25 (hole) 38 -7.79904e+03 -1.10e-05
polygon 26 (hole) 3 -3.41897e-05 -4.84e-14
polygon 27 (hole) 3 -3.65499e-03 -5.18e-12
polygon 28 (hole) 3 -4.95057e-02 -7.01e-11
polygon 29 91 1.49663e+04 2.12e-05
polygon 30 (hole) 3 -3.79135e-02 -5.37e-11
polygon 31 (hole) 5 -2.92235e-04 -4.14e-13
polygon 32 (hole) 3 -7.43616e-06 -1.05e-14
polygon 33 (hole) 270 -1.21455e+03 -1.72e-06
polygon 34 (hole) 19 -4.39650e+00 -6.23e-09
polygon 35 (hole) 35 -1.38385e+02 -1.96e-07
polygon 36 (hole) 23 -1.99656e+01 -2.83e-08
polygon 37 40 1.38607e+04 1.96e-05
polygon 38 (hole) 41 -6.00381e+03 -8.50e-06
polygon 39 (hole) 7 -1.40546e-01 -1.99e-10
polygon 40 (hole) 11 -8.36705e+01 -1.18e-07
polygon 41 (hole) 3 -2.33435e-03 -3.31e-12
polygon 42 45 2.51218e+03 3.56e-06
polygon 43 139 3.22293e+03 4.56e-06
polygon 44 148 3.10395e+03 4.40e-06
polygon 45 (hole) 4 -1.72650e-04 -2.44e-13
polygon 46 75 1.73526e+04 2.46e-05
polygon 47 83 5.28920e+03 7.49e-06
polygon 48 106 3.04104e+03 4.31e-06
polygon 49 264 1.50631e+06 2.13e-03
polygon 50 71 5.63061e+03 7.97e-06
polygon 51 10 1.99717e+02 2.83e-07
polygon 52 (hole) 3 -1.37223e-02 -1.94e-11
polygon 53 487 2.06117e+06 2.92e-03
polygon 54 65 8.42861e+04 1.19e-04
polygon 55 47 3.82087e+04 5.41e-05
polygon 56 22 6.74651e+03 9.55e-06
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
(53270 x 28810 units)
Window area = 706156000 square units
Fraction of frame area: 0.46
6.0 Exploratory Spatial Data Analysis
6.1 Creating Point Symbol Maps
tmap_mode("view")
tm_shape(origin_sf) +
tm_dots(alpha=0.4,
size=0.05)
tm_shape(destination_sf) +
tm_dots(alpha=0.4,
size=0.05)6.2 Measuring Central Tendency
Descriptive statistics are used in point pattern analysis to summarise a point pattern’s basic properties, such as its central tendency and dispersion. The mean centre and the median centre are two often employed metrics for central tendency (Gimond, 2019).
6.2.1 Mean Center
Mean center is the arithmetic average of the (x, y) coordinates of all point in the study area. Similar to mean in statistical analysis, mean center is influenced to a greater degree by the outliers. (Yuan et al.,2020)
origin_xy <- st_coordinates(origin_sf)
origin_mc <- apply(origin_xy, 2, mean)
destination_xy <- st_coordinates(destination_sf)
destination_mc <- apply(destination_xy, 2, mean)
origin_mc X Y
28490.57 36939.04
destination_mc X Y
28870.96 36590.49
The results show that the origin and destination mean centres are, respectively, (28490.57, 36939.04) and (28870.96, 36590.49). The two mean centres appear to be situated in close proximity to one another.
6.2.2 Median Center
Median center is the location that minimizes the sum of distances required to travel to all points within an observation window. It can be calculated using an iterative procedure first presented by Kulin and Kuenne (1962). The procedure begins at a predetermined point, such as the median center, as the initial point. Then, the algorithm updates the median center’s new coordinates (x’, y’) continually until the optimal value is reached. The median center, as opposed to the mean center, offers a more reliable indicator of central tendency as it is unaffected by outliers (Yuan et al., 2020).
origin_medc <- apply(origin_xy, 2, median)
destination_medc <- apply(destination_xy, 2, median)
origin_medc X Y
28553.17 36179.05
destination_medc X Y
28855.04 35883.86
Based on the results, the median centres of origin and destination are, respectively, (28553.17, 36179.05) and (28855.04, 35883.86). The two median centres appear to be situated in close proximity to one another.
Moreover, mean centers and median centers for each origin and destination points are similar. This may imply that the distribution of the data is relatively balanced and there is not a significant difference in the spatial patterns between the origin and destination points. Additionally, this indicates that both the mean center and median center are effective measures for analyzing the central tendency of the data in this context.
6.2.3 Plotting Mean and Median Centers
We can try to plot both results obtained from previous section on the same plane for better comparison of the mean center and median center.
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey', main="mean and median centers of origin_sf")
points(origin_xy, cex=.5)
points(cbind(origin_mc[1], origin_mc[2]), pch='*', col='red', cex=3)
points(cbind(origin_medc[1], origin_medc[2]), pch='*', col='purple', cex=3)
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey', main="mean and median centers of destination_sf")
points(destination_xy, cex=.5)
points(cbind(destination_mc[1], destination_mc[2]), pch='*', col='yellow', cex=3)
points(cbind(destination_medc[1], destination_medc[2]), pch='*', col='orange', cex=3)
6.2 Measuring Dispersion
6.2.1 Standard Distance
Standard distances are defined similarly to standard deviations. This indicator measures how dispersed a group of points is around its mean center (Gimond, 2023).
origin_sd <- sqrt(sum((origin_xy[,1] - origin_mc[1])^2 + (origin_xy[,2] - origin_mc[2])^2) / nrow(origin_xy))
destination_sd <- sqrt(sum((destination_xy[,1] - destination_mc[1])^2 + (destination_xy[,2] - destination_mc[2])^2) / nrow(destination_xy))
origin_sd[1] 10187.88
destination_sd[1] 9545.69
From the results, the origin and destination standard distances are 10187.88 and 9545.69, respectively. Hence, it appears that origin points are more dispersed than the origin points.
However, it would be challenging to discern why the origin points are more dispersed without further analysis. Further analysis would be needed to determine the factors contributing to the increased dispersion of destination points. Since it is out of scope for this exercise, we will not explore any further.
6.2.3 Plotting Standard Distances
In this section, we will create bearing circles of origin and destination points using the standard distance values we have calculated earlier. This can provide visual representation of their dispersion and make intuitive comparison between them.
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey', main="standard distance of origin_sf")
points(origin_xy, cex=.5)
points(cbind(origin_mc[1], origin_mc[2]), pch='*', col='red', cex=3)
bearing <- 1:360 * pi/180
cx <- origin_mc[1] + origin_sd * cos(bearing)
cy <- origin_mc[2] + origin_sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='red', lwd=2)
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey',main="standard distance of destination_sf")
points(destination_xy, cex=.5)
points(cbind(destination_mc[1], destination_mc[2]), pch='*', col='purple', cex=3)
bearing <- 1:360 * pi/180
cx <- destination_mc[1] + destination_sd * cos(bearing)
cy <- destination_mc[2] + destination_sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='purple', lwd=2)
A better comparison of the standard distances between origin and destination points can also be achieved by trying to plot both results on the same plane.
par(mar = c(0,0,1,0))
plot(sg_sf, col='light grey',main="standard distances of origin_sf & destination_sf")
points(cbind(origin_mc[1], origin_mc[2]), pch='*', col='red', cex=3)
points(cbind(destination_mc[1], destination_mc[2]), pch='*', col='purple', cex=3)
bearing <- 1:360 * pi/180
origin_cx <- origin_mc[1] + origin_sd * cos(bearing)
origin_cy <- origin_mc[2] + origin_sd * sin(bearing)
destination_cx <- destination_mc[1] + destination_sd * cos(bearing)
destination_cy <- destination_mc[2] + destination_sd * sin(bearing)
origin_circle <- cbind(origin_cx, origin_cy)
destination_circle <- cbind(destination_cx, destination_cy)
lines(origin_circle, col='red', lwd=2)
lines(destination_circle, col='purple', lwd=2)
7.0 First-Order Spatial Point Patterns Analysis
After data wrangling is complete, we will start to perform first-order spatial point pattern analysis using functions from spatstat package. As we have discussed in Section 2.0., First-order properties concern the characteristics of individual point locations and their variations of their density across space and are mostly addressed by density-based techniques, such as quadrant analysis and kernel density estimation.
Investigation of the intensity of a point pattern is one of the first and most important steps in point pattern analysis (Baddeley et al., 2015). If the point process has an intensity function λ(u), this function can be estimated non-parametrically by kernel estimation (Baddeley et al., 2015). Kernel estimation allows for smoothing of the probability density estimation of a random variable (in this analysis a point event) based on kernels as weights.
7.1 Rescaling origin_ppp_sg and destination_ppp_sg
The SVY21 Coordinate References System uses meters as the standard unit. Hence, the original_ppp_sg and destination_ppp_sg that we have prepared in the previous sections has “metres” as the unit. However, we will need to convert the measuring unit from metre to kilometeres when calculating the kernel density estimators for entirety of Singapore because kilometers provide a more appropriate scale for analyzing large areas.
origin_ppp_sg.km <- rescale(origin_ppp_sg, 1000, "km")
destination_ppp_sg.km <- rescale(destination_ppp_sg, 1000, "km")7.2 Computing Default Kernel Density Estimation
Kernel Destiny Estimation (KDE) generates a surface (raster) representing the estimated distribution of point events over the observation window. Each cell in the KDE layer carries a value representing the estimated density of that location (Wilkin, 2020). Hence, this approach is also known as local density approach. To build the KDE layer, a localised density is calculated for multiple small subsets of the observation window. However, these subsets overlap throughout each iteration, resulting in a moving window defined by a kernel (Wilkin, 2020; Gimond, 2023).
In this section, we will focus on destination points as we would like to identify areas that exert a ‘pull’ effect on people, hence resulting in cluster of trajectory destinations. Analyzing the destination of Grab trajectories can provide interesting insights into pull factors within a given area and help identify popular destinations or areas of high mobility demand.
Kernel estimation is implemented in spatstat by the function density.ppp(), a method for the generic command density.
par(mar = c(0,1,1,1))
kde_default_destination <- density(destination_ppp_sg.km)
plot(kde_default_destination,main = "Default Density KDE for Destination Points")
contour(kde_default_destination, add=TRUE)
sigma argument in density() function controls the bandwidth of kernel function. The choice of the bandwidth affects the kernel density estimation strongly. A smaller bandwidth will produce a finer density estimate with all little peaks and valleys. A larger bandwidth will result into a smoother distribution of point densities. Generally speaking, if the bandwidth is too small the estimate is too noisy, while if bandwidth is too high the estimate may miss crucial elements of the point pattern due to over-smoothing (Scott, 2009).

When sigma value is not specified, an isotropic Gaussian kernel will be used, with a default value of sigma calculated by a simple rule of thumb that depends only on the size of the window. Hence, the KDE given by default argument may not be what we aim to achieve. Looking at the KDE plot we have created above, there are signs of oversmoothing where only a single spatial cluster in the CBD area being observable. This can severely limit our analysis as potential small-scale clusters and other interesting details are being masked by the oversmoothing effect.
To overcome this challenge, we can specify smoothing bandwidth through the argument sigma or kernel function through the argument kernel to compute and plot more intuitive and detailed KDE maps.
7.3 Creating KDE Layers with Fixed Bandwidth
7.3.1 Computing Fixed Bandwidths Using Different Bandwidth Selection Methods
density() function of spatstat allows us to compute a kernel density for a given set of point events.
bw.diggle()Cross Validated Bandwidth Selection for Kernel Density: In this method, band-width σ is chosen to minimize the mean-square error criterion defined by Diggle (1985). The mean-square error is a measure of the average of the squares of the errors - that is, the average squared difference between the estimated values and the actual value.bw.CvL()Cronie and van Lieshout’s Criterion for Bandwidth Selection for Kernel Density: The bandwidth σ is chosen to minimize the discrepancy between the area of the observation window and the sum of reciprocal estimated intensity values at the points of the point process. This method aims to choose a bandwidth that best represents the underlying point process, taking into account both the observed points and the area they occupy.bw.scott()Scott’s Rule for Bandwidth Selection for Kernel Density: The bandwidth σ is computed by the rule of thumb of Scott (1992). The bandwidth is proportional to \(n^{-1/(d-4)}\) where n is the number of points and d is the number of spatial dimensions. This rule is very fast to compute. It typically produces a larger bandwidth than Diggle’s method. It is useful for estimating gradual trend.bw.ppl()Likelihood Cross Validation Bandwidth Selection for Kernel Density: This approach, explained by Loader (1999), uses likelihood cross-validation to determine the bandwidth (σ) by maximizing the point process likelihood. This method is beneficial when the aim is to maximize the likelihood of observing the given data.
bw_diggle <- bw.diggle(destination_ppp_sg.km)
bw_diggle sigma
0.008317447
bw_CvL <- bw.CvL(destination_ppp_sg.km)
bw_CvL sigma
3.745658
bw_scott <- bw.scott(destination_ppp_sg.km)
bw_scott sigma.x sigma.y
1.4763217 0.9063352
bw_ppl <- bw.ppl(destination_ppp_sg.km)
bw_ppl sigma
0.1913655
We notice that bw_diggle, bw_CvL and bw_ppl all give a numeric sigma value, whereas bw_scott, by default, provides a separate bandwidth for each coordinate axis. In the code output above, sigma.x = 1.4763217 and sigma.y = 0.9063352 are the estimated bandwidths for the x and y coordinates, respectively. These values represent the amount of smoothing applied in each direction when estimating the kernel density.
We can specify isotropic=TRUE argument when calculating bw_scott() method to produce a single value bandwidth.
bw_scott_single <- bw.scott(destination_ppp_sg.km, isotropic=TRUE)
bw_scott_single sigma
1.156738
The optimized bandwidth values generated from above methods belongs to the special class bw.optim. The plot function can be used to see the objective function for the optimisation that leads to the result.
par(mfrow = c(1,2))
plot(bw_diggle, xlim=c(-0.02,0.05), ylim=c(-60,200))
plot(bw_CvL)
par(mfrow = c(1,2))
plot(bw_scott)
plot(bw_ppl, xlim=c(-1,5), ylim=c(70000,130000))
7.3.2 Plotting Fixed-Bandwidth KDE Layers
In practice, there are no definite method to choose the KDE bandwidth. Many literature has outlined a diverse range of approaches for KDE bandwidth selection. According to Wolff and Asche (2009), the choice of bandwidth in many existing studies is mostly conducted by visually comparing different bandwidth setting.
Hence, we will now create KDE layers based on each bandwidth selection method and visualize them to have a better comparison of how distinct the resulting KDE layers are.
kde_diggle <- density(destination_ppp_sg.km, bw_diggle)
kde_CvL <- density(destination_ppp_sg.km, bw_CvL)
kde_scott <- density(destination_ppp_sg.km, bw_scott)
kde_ppl <- density(destination_ppp_sg.km, bw_ppl)
par(mar = c(1,1,1,1.5),mfrow = c(2,2))
plot(kde_diggle,main = "kde_diggle")
plot(kde_CvL,main = "kde_CvL")
plot(kde_scott,main = "kde_scott")
plot(kde_ppl,main = "kde_ppl")
Next, we will try to plot histograms to compare the distribution of KDE values obtained from density() function using different bandwidth selection methods.
par(mar = c(2,2,2,2),mfrow = c(2,2))
hist(kde_diggle,main = "kde_diggle")
hist(kde_CvL,main = "kde_CvL")
hist(kde_scott,main = "kde_scott")
hist(kde_ppl,main = "kde_ppl")
We can interpret the outputs as below:
kde_diggle: The sharp peak at the beginning indicates that the Diggle method for bandwidth selection has indentified a high concentration of points in the first bin. The rest of the bins has little to no concentration. This may suggest that one specific area in our observation window has observed a relatively high spatial clustering than the rest of the window.
kde_CvL: The more balanced distribution suggests that the CvL method for bandwidth selection is identifying a broader range of spatial point concentration. However, the bin sizes are quite small, which smooths out the overall distribution and masks some of the finer details.
kde_scott: The wider range of values and less sharp peak compared to kde_diggle indicate that the Scott method is capturing a wider range of spatial point concentrations, including both densely concentrated locations and moderately concentrated ones.
kde_ppl: The result is very similar to the Diggle method, the sharp peak at the beginning suggests a high concentration of points in a specific area, suggest that one specific area in our observation window has observed a relatively high spatial clustering than the rest of the area.
Another apporach to compare the KDE layers is to calculate the standard error of each density estimation. $SE is used to extract the standard error of the density estimate from the output of the density() function
dse_diggle <- density(destination_ppp_sg.km, bw_diggle, se=TRUE)$SE
dse_CvL <- density(destination_ppp_sg.km, bw_CvL, se=TRUE)$SE
dse_scott <- density(destination_ppp_sg.km, bw_scott, se=TRUE)$SE
dse_ppl <- density(destination_ppp_sg.km, bw_ppl, se=TRUE)$SEpar(mar = c(1,1,1,1.5),mfrow = c(2,2))
plot(dse_diggle,main = "standard error_diggle")
plot(dse_CvL,main = "standard error_CvL")
plot(dse_scott,main = "standard error_scott")
plot(dse_ppl,main = "standard error_ppl")
The standard error (SE) of the density estimate provides a measure of the uncertainty associated with the density estimate at each point. This can be useful for understanding the variability of density estimates, especially when comparing density estimates obtained using different bandwidths.
However, in many applications of KDE, the focus is often on the shape of the density estimate rather than its absolute value. In such cases, the standard error might not be as relevant. Hence, in this analysis, we will not use standard error as a criteria for choosing the bandwidth.
7.3.3 Choosing Fixed KDE Bandwidth Selection Method
Upon the exploration of various fixed bandwidth selection methods for computing KDE vales, and subsequent plotting of the respective KDE estimates, their distributions and associated standard errors, we will now select the KDE bandwidth to be used in our analysis. As we have seen in Section 7.3.1.2, each KDE bandwidth method has produced a distinct KDE and there is no definite method to choose the KDE bandwidth.
We will proceed to choose bw_scott method for further analysis. This is because:
bw_scottmethod provides a pair of bandwidth values for each coordinate axis. This allows it to capture the different levels of spatial clustering in each direction more accurately.bw_scottmethod capture the balance between bias and variance the best among all methods. If the bandwidth is too small, the estimate may be too skewed (high variance). The distribution histograms of KDE layers usingbw_diggleandbw_ppltend to indicate such nature. On the other hand, if the bandwidth is too large, the estimate may be oversmoothed, missing crucial elements of the point pattern (high bias). This is what we observed in the distribution histogram of KDE layer usingbw_CvL.
Since we have chosen to use bw_scott method, now we will plot the KDE layer using this method for further analysis.
par(mar = c(0,1,1,1))
bw_fixed_scott <- bw.scott(destination_ppp_sg.km)
bw_fixed_scott sigma.x sigma.y
1.4763217 0.9063352
kde_fixed_scott <- density(destination_ppp_sg.km, bw_fixed_scott)
plot(kde_fixed_scott,main = "Fixed-Bandwidth KDE for Grab Destination Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)
However, upon visual inspection, there are signs of a certian degree of over-smoothing when we directly use the bandwidth provide by bw_scott method. Automatic bandwidth selection methods provides a starting point for bandwidth selection, and further fine-tuning might be necessary based on the results of the plot we have created above.
To do so, we will use rule of thumb adjustment by diving the bandwidth value by 2, to reduce the bandwidth size, and hence possible over-smoothing effect.
par(mar = c(0,1,1,1))
kde_fixed_scott <- density(destination_ppp_sg.km, sigma=bw_fixed_scott/2)
plot(kde_fixed_scott,main = "Fixed-Bandwidth KDE for Grab Destination Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)
Looking at the plot created, it appears that by reducing the bandwidth (thus making the point cluster buffers smaller), the over-smoothing effects have been minimized. However, we still can observe the Grab destination hotspot areas, which we can investigate further in subsequent sections.
7.3.4 Kernel Function Selection for Fixed Bandwidth
Kernel functions are important because they control how we weight points within the radius. The default kernel in density.ppp() is the gaussian. But there are other options. We can use the epanechnikov, quartic or disc.
gaussian
epanechnikov
quartic
disc
kde_fixed_scott.gaussian <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="gaussian")
kde_fixed_scott.epanechnikov <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="epanechnikov")
kde_fixed_scott.quartic <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="quartic")
kde_fixed_scott.disc <- density(destination_ppp_sg.km,
sigma=bw_fixed_scott,
edge=TRUE,
kernel="disc")
par(mar = c(1,1,1,1.5),mfrow = c(2,2))
plot(kde_fixed_scott.gaussian, main="Gaussian")
plot(kde_fixed_scott.epanechnikov, main="Epanechnikov")
plot(kde_fixed_scott.quartic, main="Quartic")
plot(kde_fixed_scott.disc, main="Disc")
7.4 Creating KDE Layers with Spatially Adaptive Bandwidth
Fixed bandwidth kernels are commonly employed in statistical literature due to their ease of implementation. However, their application in spatial datasets often yields suboptimal estimations due to a lack of spatial and temporal adaptability (González & Moraga, 2022). Consequently, the more intuitive ‘adaptive smoothing’ approach has emerged. In this technique, the amount of smoothing is inversely related to the density of the points.
In spatstat packages, there are three main approaches (Pebesma & Bivand, 2023) in creating KDE layers with spatially adapative bandwidth.
Voronoi-Dirichlet Adaptive Density Estimate: It computes the intensity function estimate of a point pattern dataset by creating tessellations. On default, the input point pattern data is used to construct a Voronoi/Dirichlet tessellation (Barr and Schoenberg, 2010). The intensity estimate at a given location equals the reciprocal of the size of the Voronoi/Dirichlet cell containing that location.
Adaptive Kernel Density Estimate: It computes an estimate of the intensity function of a point pattern dataset using the partitioning technique of Davies and Baddeley (2018). It dynamically specifies the smoothing bandwidth to be applied to each of the points. The partitioning method of Davies and Baddeley (2018) accelerates this computation by partitioning the range of bandwidths into n-groups intervals, correspondingly subdividing the point patterns into n-groups, sub-patterns according to bandwidth, and applying fixed-bandwidth smoothing to each sub-pattern.
Nearest-Neighbour Adaptive Density Estimate: It computes an estimate of the intensity function of a point pattern dataset using the distance from each spatial location to the kth nearest points (Cressie, 1991: Silverman, 1986;Burman & Nolan, 1989). The default value of k is the square root of the number of points in the dataset. This estimator of intensity is relatively fast to compute and is spatially adaptive. Some studies suggest the use of the nearest neighbor distance as a suitable parameter for determining the bandwidth (Krisp et al., 2009).
7.4.1 Voronoi-Dirichlet Adaptive Density Estimate
The Dirichlet-Voronoï estimator is computed in spatstat by the function adaptive.density() with argument method="voronoi".
kde_destination_dirichlet_adaptive <- adaptive.density(destination_ppp_sg.km, f=1, method = "voronoi")par(mar = c(0,1,1,1))
plot(kde_destination_dirichlet_adaptive,main = "Voronoi-Dirichlet Adaptive Density Estimate")
7.4.2 Adaptive Kernel Density Estimate
The Adaptive Kernel estimator is computed in spatstat by the function adaptive.density() with argument method="kernel".
kde_destination_kernel_adaptive <- adaptive.density(destination_ppp_sg.km, method = "kernel")par(mar = c(0,1,1,1))
plot(kde_destination_kernel_adaptive,main = "Adaptive Kernel Density Estimate")
7.4.3 Nearest-Neighbour Density Estimate
The Nearest-Neighbour estimator is computed in spatstat by the function nndensity().
kde_adaptive_nn <- nndensity(destination_ppp_sg.km, k=10)par(mar = c(0,1,1,1))
plot(kde_adaptive_nn,main = "Nearest-Neighbour Adaptive Density Estimate")
7.4.3 Choosing Adaptive KDE Method
Similar to what we did for fixed bandwidth, we can try to plot histograms to compare the distribution of KDE values obtained from density() function using different adaptive bandwidth selection methods.
par(mar = c(2,2,2,2),mfrow = c(2,2))
hist(kde_destination_dirichlet_adaptive,main = "Voronoi-Dirichlet Adaptive")
hist(kde_destination_kernel_adaptive,main = "Adaptive Kernel")
hist(kde_adaptive_nn,main = "Nearest-Neighbour Adaptive")
From the outputs, it seems that there is no significance difference between the distribution of KDE values obtained across different methods. All three methods identified a high concentration of points in a specific area. Hence, we will choose to go with Adapative Kernel method because it provides the most
7.5 Plotting Interactive KDE Maps
raster_kde_fixed_scott <- raster(kde_fixed_scott)
raster_kde_adaptive_nn <- raster(kde_adaptive_nn)
raster_kde_adaptive_kernel <- raster(kde_destination_kernel_adaptive)projection(raster_kde_fixed_scott) <- CRS("+init=EPSG:3414 +units=km")
projection(raster_kde_adaptive_nn) <- CRS("+init=EPSG:3414 +units=km")
projection(raster_kde_adaptive_kernel) <- CRS("+init=EPSG:3414 +units=km")tmap_mode('view')
kde_fixed_scott <- tm_basemap(server = "OpenStreetMap.HOT") +
tm_basemap(server = "Esri.WorldImagery") +
tm_shape(raster_kde_fixed_scott) +
tm_raster("layer",
n = 10,
title = "KDE_Fixed_scott",
alpha = 0.6,
palette = c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e")) +
tm_shape(mpsz_sf)+
tm_polygons(alpha=0.1,id="PLN_AREA_N")+
tmap_options(check.and.fix = TRUE)
tmap_mode('view')
kde_adaptive_nn <- tm_basemap(server = "OpenStreetMap.HOT") +
tm_basemap(server = "Esri.WorldImagery") +
tm_shape(raster_kde_adaptive_nn) +
tm_raster("layer",
n = 7,
title = "KDE_Adaptive_nn",
style = "pretty",
alpha = 0.6,
palette = c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e")) +
tm_shape(mpsz_sf)+
tm_polygons(alpha=0.1,id="PLN_AREA_N")+
tmap_options(check.and.fix = TRUE)
tmap_mode('view')
kde_adaptive_kernel <- tm_basemap(server = "OpenStreetMap.HOT") +
tm_basemap(server = "Esri.WorldImagery") +
tm_shape(raster_kde_adaptive_kernel) +
tm_raster("layer",
n = 7,
title = "KDE_Adaptive_Kernel",
style = "pretty",
alpha = 0.6,
palette = c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e")) +
tm_shape(mpsz_sf)+
tm_polygons(alpha=0.1,id="PLN_AREA_N")+
tmap_options(check.and.fix = TRUE)
tmap_arrange(kde_fixed_scott, kde_adaptive_nn, kde_adaptive_kernel ,ncol=1,nrow=3,sync = TRUE)7.6 Planning Area-Level Kernel Density Estimation
wl= mpsz_sf %>% filter(PLN_AREA_N == "WOODLANDS")
je = mpsz_sf %>% filter(PLN_AREA_N == "JURONG EAST")
jw = mpsz_sf %>% filter(PLN_AREA_N == "JURONG WEST")
tn = mpsz_sf %>% filter(PLN_AREA_N == "TAMPINES")
tp = mpsz_sf %>% filter(PLN_AREA_N == "TOA PAYOH")
pg = mpsz_sf %>% filter(PLN_AREA_N == "PUNGGOL")
par(mar = c(1,1,1,0),mfrow=c(2,3))
plot(st_geometry(wl), main = "Punggol")
plot(st_geometry(je), main = "Jurong East")
plot(st_geometry(jw), main = "Jurong West")
plot(st_geometry(tn), main = "Tampines")
plot(st_geometry(tp), main = "Toa Payoh")
plot(st_geometry(pg), main = "Punggol")
wl_owin = as.owin(wl)
je_owin = as.owin(je)
jw_owin = as.owin(jw)
tn_owin = as.owin(tn)
tp_owin = as.owin(tp)
pg_owin = as.owin(pg)
destination_wl_ppp = destination_ppp_sg[wl_owin]
destination_je_ppp = destination_ppp_sg[je_owin]
destination_jw_ppp = destination_ppp_sg[jw_owin]
destination_tn_ppp = destination_ppp_sg[tn_owin]
destination_tp_ppp = destination_ppp_sg[tp_owin]
destination_pg_ppp = destination_ppp_sg[pg_owin]wl_kde_scott <- density(destination_wl_ppp, sigma=bw.scott, main="Woodlands")
je_kde_scott <- density(destination_je_ppp, sigma=bw.scott, main="Jurong East")
jw_kde_scott <- density(destination_jw_ppp, sigma=bw.scott, main="Jurong West")
tn_kde_scott <- density(destination_tn_ppp, sigma=bw.scott, main="Tampines")
tp_kde_scott <- density(destination_tp_ppp, sigma=bw.scott, main="Toa Payoh")
pg_kde_scott <- density(destination_pg_ppp, sigma=bw.scott, main="Punggol")
par(mar = c(1,1,1,1.5),mfrow = c(3,2))
plot(wl_kde_scott,main = "KDE Woodlands")
contour(wl_kde_scott, add=TRUE)
plot(je_kde_scott,main = "KDE Jurong East")
contour(je_kde_scott, add=TRUE)
plot(jw_kde_scott,main = "KDE Jurong West")
contour(jw_kde_scott, add=TRUE)
plot(tn_kde_scott,main = "KDE Tampines")
contour(tn_kde_scott, add=TRUE)
plot(tp_kde_scott,main = "KDE Toa Payoh")
contour(tp_kde_scott, add=TRUE)
plot(pg_kde_scott,main = "KDE Punggol")
contour(pg_kde_scott, add=TRUE)
7.0 Spatial Randomness Test
Clark and Evans (1954) give a very simple test of spatial randomness called Clark and Evans aggregation index (R). It is the ratio of the observed mean nearest neighbour distance in the pattern to that expected for a Poisson point process of the same intensity. A value R>1 suggests ordering, while R<1 suggests clustering.
we will perform the Clark-Evans test of aggregation for a spatial point pattern by using clarkevans.test() of statspat.
The test hypotheses are:
H0 = The distribution of trajectory original points are randomly distributed.
H1= The distribution of trajectory original points are not randomly distributed.
The 95% confidence interval will be used.
clarkevans.test(origin_ppp_sg,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: origin_ppp_sg
R = 0.27408, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
The test hypotheses are:
H0 = The distribution of trajectory destination points are randomly distributed.
H1= The distribution of trajectory destination points are not randomly distributed.
The 95% confidence interval will be used.
clarkevans.test(destination_ppp_sg,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: destination_ppp_sg
R = 0.29484, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
8.0 Network Constrained Kernel Density Estimation (NKDE)
Many real world point event are not randomly distributed. Their distribution, on the other hand, are constrained by network such as roads, rivers, and fault lines just to name a few of them.
8.1 Extracting Road Networks for Focus Planning Areas
wl_network = st_intersection(sg_driving_sf,st_union(wl))
je_network = st_intersection(sg_driving_sf,st_union(je))
jw_network = st_intersection(sg_driving_sf,st_union(jw))
tn_network = st_intersection(sg_driving_sf,st_union(tn))
tp_network = st_intersection(sg_driving_sf,st_union(tp))
pg_network = st_intersection(sg_driving_sf,st_union(pg))wl_destination = st_intersection(destination_sf,st_union(wl))
je_destination = st_intersection(destination_sf,st_union(je))
jw_destination = st_intersection(destination_sf,st_union(jw))
tn_destination = st_intersection(destination_sf,st_union(tn))
tp_destination = st_intersection(destination_sf,st_union(tp))
pg_destination = st_intersection(destination_sf,st_union(pg))je_networkSimple feature collection with 4257 features and 10 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 14256.49 ymin: 30994.22 xmax: 19398.25 ymax: 37124.48
Projected CRS: SVY21 / Singapore TM
First 10 features:
osm_id code fclass name ref oneway maxspeed layer
518 22752251 5114 secondary Jurong East Central <NA> F 60 0
519 22752252 5115 tertiary Jurong East Avenue 1 <NA> F 50 0
520 22752253 5114 secondary Jurong East Central <NA> F 60 1
526 22752658 5114 secondary Toh Guan Road <NA> F 60 1
731 22885253 5115 tertiary Jurong East Street 31 <NA> F 50 0
817 22892522 5114 secondary West Coast Road <NA> F 60 0
825 22903881 5114 secondary Penjuru Road <NA> F 60 0
826 22903882 5114 secondary Penjuru Road <NA> F 60 0
827 22903883 5114 secondary Penjuru Road <NA> F 60 0
828 22903884 5114 secondary Penjuru Road <NA> F 60 0
bridge tunnel geometry
518 F F LINESTRING (18149.66 36108....
519 F F LINESTRING (17196.14 36155....
520 T F LINESTRING (18212.37 36194....
526 T F LINESTRING (18593.72 35974....
731 F F LINESTRING (16513.23 36360....
817 F F LINESTRING (17207.24 33643....
825 F F LINESTRING (17235.38 33767....
826 F F LINESTRING (17419.31 34348....
827 F F LINESTRING (17186.61 33509....
828 F F LINESTRING (17131.03 33306....
tmap_mode("plot")
tm_shape(st_union(wl)) +
tm_polygons(col="white") +
tm_shape(wl_destination) +
tm_dots() +
tm_shape(wl_network) +
tm_lines(col="fclass", palette ="Spectral") +
tm_layout(main.title = "Road Network and Destination Points in Woodlands",
main.title.position = "center",
main.title.size = 0.9,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)
tm_shape(st_union(je)) +
tm_polygons(col="white") +
tm_shape(je_destination) +
tm_dots() +
tm_shape(je_network) +
tm_lines(col="fclass", palette ="Spectral") +
tm_layout(main.title = "Road Network and Destination Points in Jurong East",
main.title.position = "center",
main.title.size = 0.9,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)
tm_shape(st_union(jw)) +
tm_polygons(col="white") +
tm_shape(jw_destination) +
tm_dots() +
tm_shape(jw_network) +
tm_lines(col="fclass", palette ="Spectral") +
tm_layout(main.title = "Road Network and Destination Points in Jurong West",
main.title.position = "center",
main.title.size = 0.9,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)
tm_shape(st_union(tn)) +
tm_polygons(col="white") +
tm_shape(tn_destination) +
tm_dots() +
tm_shape(tn_network) +
tm_lines(col="fclass", palette ="Spectral") +
tm_layout(main.title = "Road Network and Destination Points in Tampines",
main.title.position = "center",
main.title.size = 0.9,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)
tm_shape(st_union(tp)) +
tm_polygons(col="white") +
tm_shape(tp_destination) +
tm_dots() +
tm_shape(tp_network) +
tm_lines(col="fclass", palette ="Spectral") +
tm_layout(main.title = "Road Network and Destination Points in Toa Payoh",
main.title.position = "center",
main.title.size = 0.9,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)
pg_network
tm_shape(st_union(tp)) +
tm_polygons(col="white") +
tm_shape(pg_destination) +
tm_dots() +
tm_shape(pg_network) +
tm_lines(col="fclass", palette ="Spectral") +
tm_layout(main.title = "Road Network and Destination Points in Punggol",
main.title.position = "center",
main.title.size = 0.9,
legend.outside = TRUE,
frame = TRUE) +
tm_borders(alpha = 0.5)8.3 Data Preparation
For illustrative purpose, I’ll use “Punggol” as a example to demonstrate step-by-step data preparation. We will create maps for the remaining planning areas at the end.
The spNetwork package contains nkde function that is specifically designed to implement Network Constrained Kernel Density Estimation (NetKDE).
The three main inputs are: - lines, a “SpatialLinesDataFrame” (object defined in the sp package), representing the lines of the network. - events, a “SpatialPointsDataframe” (object defined in the sp package), representing the realizations of the spatial process. - samples, a “SpatialPointsDataframe” providing the locations where the density must be estimated.
8.3.1 Preparing the lixels objects
Before computing NetKDE, the SpatialLines object need to be cut into lixels with a specified minimal distance. This task can be performed by using with lixelize_lines() of spNetwork
pg_lixels <- lixelize_lines(pg_network,
700,
mindist = 350)The length of a lixel, lx_length is set to 700m, and
The minimum length of a lixel, mindist is set to 350m.
After cut, if the length of the final lixel is shorter than the minimum distance, then it is added to the previous lixel. If NULL, then mindist = maxdist/10. Also note that the segments that are already shorter than the minimum distance are not modified.
8.3.2 Generating line centre points
Next, lines_center() of spNetwork will be used to generate a SpatialPointsDataFrame (i.e. samples) with line centre points. The points are located at center of the line based on the length of the line.
pg_samples <- lines_center(pg_lixels)8.4 Performing NetKDE
8.4.1 Computing NetKDE densities
We are ready to compute the NetKDE.
pg_nkde_simple <- nkde(pg_network,
events = pg_destination,
w = rep(1,nrow(pg_destination)),
samples = pg_samples,
kernel_name = "quartic",
bw = 300,
div= "bw",
method = "simple",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE)The parameters method, kernel_name and bw, indicate respectively which NKDE (simple, discontinuous, or continuous), and which kernel function (gaussian, quartic, triweight, etc.) to use, and the bandwidth of the kernel.
agg the events can be aggregated, and their weights added within a threshold distance. In many applications, a small distance between events is not significant. Aggregating events can simplify networks and limit the number of iterations when calculating the NKDE, hence effectively reducing time complexity of computation.
pg_nkde_discontinuous <- nkde(pg_network,
events = pg_destination,
w = rep(1,nrow(pg_destination)),
samples = pg_samples,
kernel_name = "quartic",
bw = 200,
div= "bw",
method = "discontinuous",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE)pg_nkde_continuous <- nkde(pg_network,
events = pg_destination,
w = rep(1,nrow(pg_destination)),
samples = pg_samples,
kernel_name = "quartic",
bw = 200,
div= "bw",
method = "continuous",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5,
sparse = TRUE,
verbose = FALSE)8.4.1 Visualising NetKDE densities
Before we can visualise the NetKDE values, code chunk below will be used to insert the computed density values (i.e. densities) into samples and lixels objects as density field. Similar to what we did in KDE, we will also rescale the density values.
pg_samples$nkde_simple <- pg_nkde_simple*1000
pg_lixels$nkde_simple <- pg_nkde_simple*1000
pg_samples$nkde_discontinuous <- pg_nkde_discontinuous*1000
pg_lixels$nkde_discontinuous <- pg_nkde_discontinuous*1000
pg_samples$nkde_continuous <- pg_nkde_continuous*1000
pg_lixels$nkde_continuous <- pg_nkde_continuous*1000Now we can plot the NetKDE map using tmap.
tmap_mode('view')
pg_nkde_simple_map <- tm_basemap(server = "Esri.WorldTopoMap") +
tm_basemap(server = "Esri.WorldGrayCanvas")+
tm_basemap(server = "OpenStreetMap") +
tm_shape(pg_lixels)+
tm_lines(col="nkde_simple", lwd = 2, palette =c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"))+
tm_shape(pg_destination)+
tm_dots(size=0.01)
pg_nkde_discontinuous_map <- tm_basemap(server = "Esri.WorldTopoMap") +
tm_basemap(server = "Esri.WorldGrayCanvas")+
tm_basemap(server = "OpenStreetMap") +
tm_shape(pg_lixels)+
tm_lines(col="nkde_discontinuous", lwd = 2, palette =c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"))+
tm_shape(pg_destination)+
tm_dots(size=0.01)
pg_nkde_continuous_map <- tm_basemap(server = "Esri.WorldTopoMap") +
tm_basemap(server = "Esri.WorldGrayCanvas")+
tm_basemap(server = "OpenStreetMap") +
tm_shape(pg_lixels)+
tm_lines(col="nkde_continuous", lwd = 2, palette =c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"))+
tm_shape(pg_destination)+
tm_dots(size=0.01)
tmap_arrange(pg_nkde_simple_map, pg_nkde_discontinuous_map, pg_nkde_continuous_map ,ncol=3,nrow=1,sync = TRUE)We can also compare with traditonal KDE as below.
raster_pg_kde_scott = raster(pg_kde_scott)
projection(raster_pg_kde_scott) <- CRS("+init=EPSG:3414 +units=km")
tmap_mode('view')
tm_basemap(server = "Esri.WorldGrayCanvas") +
tm_shape(raster_pg_kde_scott)+
tm_raster("layer",
n = 7,
title = "KDE_Adaptive_Kernel",
style = "pretty",
alpha = 0.6,
palette = c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"))8.5 Plotting NetKDE for Other Planning Areas
unique_types <- unique(st_geometry_type(tn_network))
# debugging: filter roads_tampines to keep only linestrings
if ("LINESTRING" %in% unique_types) {
tn_network <- st_cast(tn_network, "LINESTRING")
} else {
# handle the case when no linestrings are found
stop("No linestrings found in roads_tampines.")
}
tn_networkSimple feature collection with 6726 features and 10 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 38247.79 ymin: 33132.85 xmax: 44844.97 ymax: 39749.52
Projected CRS: SVY21 / Singapore TM
First 10 features:
osm_id code fclass name ref oneway maxspeed
235 22616618 5113 primary Upper Changi Road East <NA> F 60
236 22616708 5113 primary Upper Changi Road East <NA> F 60
237 22616711 5113 primary Upper Changi Road East <NA> F 60
244 22617435 5113 primary Xilin Avenue <NA> F 70
245 22617439 5113 primary Xilin Avenue <NA> F 50
246 22617440 5133 primary_link <NA> <NA> F 50
247 22617443 5113 primary Tanah Merah Coast Road <NA> F 50
248 22617620 5113 primary Tanah Merah Flyover <NA> F 50
251 22650828 5113 primary Simei Avenue <NA> F 70
252 22650830 5113 primary Simei Avenue <NA> F 70
layer bridge tunnel geometry
235 0 F F LINESTRING (42530.62 36845....
236 1 T F LINESTRING (42519.51 36852....
237 1 T F LINESTRING (42552.71 36914....
244 0 F F LINESTRING (41538.19 34954....
245 0 F F LINESTRING (43345.58 34082....
246 0 F F LINESTRING (43611.26 34061....
247 0 F F LINESTRING (43091.32 34284....
248 1 T F LINESTRING (43441.49 34050....
251 0 F F LINESTRING (41527.24 34944....
252 0 F F LINESTRING (41520.66 34964....
tn_lixels <- lixelize_lines(tn_network,
700,
mindist = 350)
tn_samples <- lines_center(tn_lixels)
tn_densities <- densities <- nkde(tn_network,
events = tn_destination,
w = rep(1,nrow(tn_destination)),
samples = tn_samples,
kernel_name = "quartic",
bw = 300,
div= "bw",
method = "simple",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5, #we aggregate events within a 5m radius (faster calculation)
sparse = TRUE,
verbose = FALSE)
tn_samples$nkde_density <- tn_densities*1000
tn_lixels$nkde_density <- tn_densities*1000
tmap_mode('view')
tm_basemap(server = "Esri.WorldTopoMap") +
tm_basemap(server = "Esri.WorldGrayCanvas")+
tm_basemap(server = "OpenStreetMap") +
tm_shape(tn_lixels)+
tm_lines(col="nkde_density", lwd = 2, palette =c("#fafac3","#fd953b","#f02a75","#b62385","#021c9e"))+
tm_shape(tn_destination)+
tm_dots(size=0.01)9.0 Temporal Network Kernel Density Estimation (TNKDE)
9.1 Visualising Frequency Distribution
origin_day <- ggplot(data=origin_sf,
aes(x=weekday)) +
geom_bar()
destination_day <- ggplot(data=destination_sf,
aes(x=weekday)) +
geom_bar()
origin_day + destination_day
9.2 Plotting Frequency of Trip Origination By Hour
origin_sf$pickup_hr_num <- as.numeric(origin_sf$pickup_hr)
pickup_hr_num <- origin_sf$pickup_hr_num
ggplot(data=origin_sf,
aes(x=pickup_hr_num), bins = 24) +
geom_bar() +
scale_x_continuous(breaks = pickup_hr_num)
9.3 Plotting Frequency of Trip Destination By Hour
destination_sf$dropoff_hr_num <- as.numeric(destination_sf$dropoff_hr)
dropoff_hr_num <- destination_sf$dropoff_hr_num
ggplot(data=destination_sf,
aes(x=dropoff_hr_num), bins = 24) +
geom_bar() +
scale_x_continuous(breaks = dropoff_hr_num)
9.4 Bandwidth Selection
w <- rep(1,nrow(destination_sf))
samples <- seq(0, max(destination_sf$dropoff_hr_num), 0.5)
bw_bcv <- bw.bcv(destination_sf$dropoff_hr_num)
bw_ucv <- bw.ucv(destination_sf$dropoff_hr_num)
bw_SJ <- bw.SJ(destination_sf$dropoff_hr_num)
time_kernel_values <- data.frame(
bw_bcv = tkde(destination_sf$dropoff_hr_num, w = w, samples = samples, bw = bw_bcv, kernel_name = "quartic"),
bw_ucv = tkde(destination_sf$dropoff_hr_num, w = w, samples = samples, bw = bw_ucv, kernel_name = "quartic"),
bw_SJ = tkde(destination_sf$dropoff_hr_num, w = w, samples = samples, bw = bw_SJ, kernel_name = "quartic"),
time = samples
)
df_time <- reshape2::melt(time_kernel_values,id.vars = "time")
df_time$variable <- as.factor(df_time$variable)
ggplot(data = df_time) +
geom_line(aes(x = time, y = value)) +
scale_x_continuous(breaks = dropoff_hr_num) +
facet_wrap(vars(variable), ncol=2, scales = "free") +
theme(axis.text = element_text(size = 5))
9.5 TNKDE Implementing at Punggol Planning Area
w_pg <- rep(1,nrow(pg_destination))
samples_pg <- seq(0, max(pg_destination$dropoff_hr_num), 0.5)
bw_bcv <- bw.bcv(pg_destination$dropoff_hr_num)
bw_ucv <- bw.ucv(pg_destination$dropoff_hr_num)
bw_SJ <- bw.SJ(pg_destination$dropoff_hr_num)
time_kernel_values <- data.frame(
bw_bcv = tkde(pg_destination$dropoff_hr_num, w = w_pg, samples = samples_pg, bw = bw_bcv, kernel_name = "quartic"),
bw_ucv = tkde(pg_destination$dropoff_hr_num, w = w_pg, samples = samples_pg, bw = bw_ucv, kernel_name = "quartic"),
bw_SJ = tkde(pg_destination$dropoff_hr_num, w = w_pg, samples = samples_pg, bw = bw_SJ, kernel_name = "quartic"),
time = samples_pg
)
df_time <- reshape2::melt(time_kernel_values,id.vars = "time")
df_time$variable <- as.factor(df_time$variable)
ggplot(data = df_time) +
geom_line(aes(x = time, y = value)) +
scale_x_continuous(breaks = dropoff_hr_num) +
facet_wrap(vars(variable), ncol=2, scales = "free") +
theme(axis.text = element_text(size = 5))
cv_scores <- bws_tnkde_cv_likelihood_calc(
bw_net_range = c(100,1000),
bw_net_step = 100,
bw_time_range = c(3,24),
bw_time_step = 3,
lines = pg_network,
events = pg_destination,
time_field = "dropoff_hr_num",
w = rep(1, nrow(pg_destination)),
kernel_name = "quartic",
method = "discontinuous",
diggle_correction = FALSE,
study_area = NULL,
max_depth = 10,
digits = 2,
tol = 0.1,
agg = 15,
sparse=TRUE,
grid_shape=c(1,1),
sub_sample=1,
verbose = FALSE,
check = TRUE)knitr::kable(cv_scores)| 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 | |
|---|---|---|---|---|---|---|---|---|
| 100 | -250.91794 | -182.71584 | -155.27632 | -116.78531 | -113.26649 | -102.39516 | -93.35400 | -89.81173 |
| 200 | -145.76563 | -90.84170 | -69.02605 | -61.88347 | -56.56890 | -49.40677 | -49.57432 | -49.73512 |
| 300 | -100.00061 | -65.40228 | -47.32325 | -43.88449 | -40.43182 | -36.94576 | -37.12520 | -37.29583 |
| 400 | -68.95819 | -41.82902 | -32.92989 | -31.35903 | -29.73603 | -29.93138 | -30.11827 | -30.29386 |
| 500 | -54.37326 | -34.65375 | -29.47804 | -29.75000 | -28.14349 | -28.34934 | -28.54072 | -28.71939 |
| 600 | -47.21774 | -34.83465 | -29.69234 | -29.97549 | -28.37665 | -28.58649 | -28.77993 | -28.96001 |
| 700 | -36.42261 | -29.55281 | -28.07569 | -28.36425 | -28.59379 | -28.80594 | -29.00069 | -29.18167 |
| 800 | -32.88500 | -27.88168 | -28.25843 | -28.55089 | -28.79093 | -29.00503 | -29.20083 | -29.38249 |
| 900 | -31.20502 | -28.04346 | -28.43185 | -28.72756 | -28.97115 | -29.18680 | -29.38338 | -29.56554 |
| 1000 | -31.33948 | -28.19916 | -28.59400 | -28.89218 | -29.13776 | -29.35455 | -29.55170 | -29.73421 |
According to the “leave one out cross validation” method, the optimal set of bandwidths is 800 metres and 6 hrs. As expected, larger bandwidths are required because the density of the events are spread both in space and time.
# choosing sample in times (every 1 hr)
pg_sample_time <- seq(0, max(pg_destination$dropoff_hr_num), 1.0)
# calculating densities
tnkde_densities <- tnkde(lines = pg_network,
events = pg_destination,
time_field = "dropoff_hr_num",
w = rep(1, nrow(pg_destination)),
samples_loc = pg_samples,
samples_time = pg_sample_time,
kernel_name = "quartic",
bw_net = 700, bw_time = 60,
adaptive = TRUE,
trim_bw_net = 900,
trim_bw_time = 80,
method = "discontinuous",
div = "bw", max_depth = 10,
digits = 2, tol = 0.01,
agg = 15, grid_shape = c(1,1),
verbose = FALSE)# creating a color palette for all the densities
library(classInt)
library(viridis)
all_densities <- c(tnkde_densities$k)
color_breaks <- classIntervals(all_densities, n = 10, style = "kmeans")
# generating a map at each sample time
all_maps <- lapply(1:ncol(tnkde_densities$k), function(i){
time <- pg_sample_time[[i]]
pg_samples$tnkde_density <- tnkde_densities$k[,i]
map1 <- tm_shape(pg_samples) +
tm_dots(col = "tnkde_density", size = 0.01,
breaks = color_breaks$brks, palette = viridis(10)) +
tm_layout(legend.show=FALSE, main.title = as.character(time), main.title.size = 0.5)
return(map1)
})# creating a gif with all the maps
tmap_animation(all_maps, filename = "images/animated_ma.gif",
width = 1000, height = 1000, dpi = 300, delay = 50)``` 
References
Baddeley, A., Rubak, E., & Turner, R. (2015). Spatial Point Patterns: Methodology and Applications with R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b19708.
Boots, B.N., & Getis, A. (1988). Point Pattern Analysis. Reprint. Edited by G.I. Thrall. WVU Research Repository.
Floch, J.-M., Marcon, E., Puech, F. (n.d.). Spatial distribution of points. In M.-P. de Bellefon (Ed.), Handbook of Spatial Analysis : Theory and Application with R (pp. 72–111). Insee-Eurostat.
Gimond (2023). Chapter 11 Point Pattern Analysis. Retrieved from https://mgimond.github.io/Spatial/index.html.
Pebesma, E.; Bivand, R. (2023). Spatial Data Science: With Applications in R (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
Rey, S.J., Arribas-Bel, D., Wolf, L.J. (2023). Point Pattern Analysis. In: Geographic Data Science with python. CRC Press.
Wilkin, J. (2020). Geocomputation 2020-2021 Work Book. University College London. Retrieved from https://jo-wilkin.github.io/GEOG0030/coursebook/analysing-spatial-patterns-iii-point-pattern-analysis.html.
Yuan, Y., Qiang, Y., Bin Asad, K., and Chow, T. E. (2020). Point Pattern Analysis. In J.P. Wilson (Ed.), The Geographic Information Science & Technology Body of Knowledge (1st Quarter 2020 Edition). DOI: 10.22224/gistbok/2020.1.13.
Kam, T. S. (2022). R for Geospatial Data Science and Analytics. Retrieved from https://r4gdsa.netlify.app/